-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Feature][EPLB] Add eplb support for Qwen3 #20815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @aladerran, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces EPLB support for the Qwen3 model, enhancing its performance and scalability. The changes primarily focus on integrating EPLB into the model's MoE layers and adjusting the weight loading process to handle expert-specific weights. The test plan includes running a script to verify the functionality, and the test results show successful execution with EPLB enabled.
Highlights
- EPLB Support: Adds Expert Parallel Load Balancing (EPLB) support for the Qwen3 model, addressing issue #20468.
- FusedMoE Integration: Integrates EPLB functionality into the FusedMoE layer, including handling of redundant experts and expert mapping.
- Weight Loading: Modifies weight loading logic to accommodate expert weights and ensure correct mapping of weights to experts.
- Model Configuration: Adds configuration options for enabling EPLB and setting the number of redundant experts.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds EPLB support for the Qwen3 MoE model. The changes propagate EPLB configurations and update the weight loading logic to handle redundant experts. The implementation of the MixtureOfExperts interface is also mostly complete. I've added one comment to ensure full compliance with the interface protocol.
|
This pull request has merge conflicts that must be resolved before it can be |
add4e01 to
e549605
Compare
913c81e to
05191c3
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
05191c3 to
b9fdd8e
Compare
|
Hi @abmfy, @DarkLight1337 I’ve completed testing and rebased the PR onto the latest base branch. Thank you! |
97768a6 to
5f928b4
Compare
Signed-off-by: aladerran <aladerran@gmail.com>
5f928b4 to
c4b3062
Compare
Sure—really appreciate the contributions! I just got back from traveling and will review it soon. Sorry for the delay! |
|
|
||
| params_dict = dict(self.named_parameters()) | ||
| loaded_params: set[str] = set() | ||
| expert_params_mapping = self.get_expert_mapping() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line overrides the previous expert_params_mapping binding. I suggest refactoring lines L435–L442 into the get_expert_mapping method.
Additionally, since the current weight loading process does not account for redundant experts, I suspect there might be some issues in the weight loading logic. I’ll run some accuracy tests to verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored, any suggestion or help needed for verifying weight loading process?
| # Skip loading extra parameters for GPTQ/modelopt models. | ||
| if name.endswith( | ||
| if name_mapped.endswith( | ||
| ignore_suffixes) and name not in params_dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should modify name to name_mapped in this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, this function assumes that name will be directly replaced in place. However, since we’re now creating a new variable, should we update the usage of name here to name_mapped instead?
(It looks like the change might have gone in the opposite direction)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I guess I rushed through. Thanks for pointing it out.
Resolved.
|
@abmfy Thanks for the review! Will respond tonight. |
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, tests will be added in #21290
|
@DarkLight1337 Thanks for the heads-up. I'll follow up on https://github.com/vllm-project/vllm/pull/21290 . By the way, I checked the CI logs, and it seems the failure isn't caused by this PR. |
@aladerran thanks for your reply.And we will add test later #21290 |
|
nice |
|
nice done |
Signed-off-by: aladerran <aladerran@gmail.com>
Signed-off-by: aladerran <aladerran@gmail.com>
Signed-off-by: aladerran <aladerran@gmail.com> Signed-off-by: x22x22 <wadeking@qq.com>
Signed-off-by: aladerran <aladerran@gmail.com>
Signed-off-by: aladerran <aladerran@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: aladerran <aladerran@gmail.com> Signed-off-by: Noam Gat <noamgat@gmail.com>
Signed-off-by: aladerran <aladerran@gmail.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: aladerran <aladerran@gmail.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
|
@mgoin @aladerran this change caused a regression in loading qwen3 coder 480b. there's a key error with |
|
Thanks for reporting @koush, would you mind opening an issue so we can track for the upcoming release? |
Signed-off-by: aladerran <aladerran@gmail.com>
Signed-off-by: aladerran <aladerran@gmail.com>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
In response to #20468, add EPLB support for Qwen3-fp8 model.
Test Plan
Running
CUDA_VISIBLE_DEVICES=0,1,2,3 python path_to_this_script.pyTest Result
(Optional) Documentation Update